|
|
Unsupervised Feature Selection for Interval Ordered Information Systems |
YAN Yuejun, DAI Jianhua |
School of Computer Science and Technology, Tianjin University, Tianjin 300350 |
|
|
Abstract There are a number of unsupervised feature selection methods proposed for single-valued information systems, but little research focuses on unsupervised feature selection of interval-valued information systems.In this paper, a fuzzy dominance relation is proposed for interval ordered information systems. Then, fuzzy rank information entropy and fuzzy rank mutual information are extended to evaluate the importance of features. Consequently, an unsupervised feature selection method is designed based on an unsupervised maximum information and minimum redundancy(UmImR) criterion. In the UmImR criterion, the amount of information and redundancy are taken into account. Experimental results demonstrate the effectiveness of the proposed method.
|
Received: 06 May 2017
|
|
Fund:Supported by National Natural Science Foundation of China(No.61473259,61070074), National Science & Technology Support Program of China(No.2015BAK26B00,2015BAK26B02), BEIYANG Young Scholars Program of Tianjin University(No.2016XRX-0001) |
About author:: (YAN Yuejun, born in 1992, master student. Her research interests include rough set theory.) (DAI Jianhua(Corresponding author), born in 1977, Ph. D., professor. His research interests include artificial intelligence, machine learning, data mining and soft computing.) |
|
|
|
[1] LANGLEY P. Selection of Relevant Features in Machine Learning. Technical Report, FS-94-02. Palo Alto, USA: AAAI Press, 1997. [2] DASH M, LIU H. Feature Selection for Classification. Intelligent Data Analysis, 1997, 1(1/2/3/4): 131-156. [3] SUN Z, BEBIS G, MILLER R. Object Detection Using Feature Subset Selection. Pattern Recognition, 2004, 37(11): 2165-2176. [4] 徐峻岭,周毓明,陈 林,等.基于互信息的无监督特征选择.计算机研究与发展, 2012, 49(2): 372-382. (XU J L, ZHOU Y M, CHEN L, et al. An Unsupervised Feature Selection Approach Based on Mutual Information. Journal of Computer Research and Development, 2012, 49(2): 372-382.) [5] THANGAVEL K, VELAYUTHAM C. Rough Set Based Unsupervised Feature Selection in Digital Mammogram Image Using Entropy Measure. Journal of Medical Imaging & Health Informatics, 2012, 2(3): 320-326. [6] PARTHAL N M, JENSEN R. Unsupervised Fuzzy-Rough Set-Based Dimensionality Reduction. Information Sciences, 2013, 229: 106-121. [7] MITRA P, MURTHY C A, PAL S K. Unsupervised Feature Selection Using Feature Similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(3): 301-312. [8] DY J G, BRODLEY C E. Feature Selection for Unsupervised Lear- ning. Journal of Machine Learning Research, 2004, 5: 845-889. [9] 秦奇伟,梁吉业,钱宇华.一种基于邻域距离的聚类特征选择方法.计算机科学, 2012, 39(1): 175-177. (QIN Q W, LIANG J Y, QIAN Y H. Clustering Feature Selection Method Based on Neighborhood Distance. Computer Science, 2012, 39(1): 175-177.) [10] WU J. Unsupervised Intrusion Feature Selection Based on Genetic Algorithm and FCM // Proc of the International Conference on Information Engineering and Applications. London, UK: Springer, 2012: 1005-1012. [11] HU Q H, YU D R, XIE Z X. Information-Preserving Hybrid Data Reduction Based on Fuzzy-Rough Techniques. Pattern Recognition Letters, 2006, 27(5): 414-423. [12] HU Q H, GUO M Z, YU D R, et al. Information Entropy for Ordinal Classification. Science China(Information Sciences), 2010, 53(6): 1188-1200. [13] QIAN Y H, LIANG J Y, DANG C Y. Interval Ordered Information Systems. Computers & Mathematics with Applications, 2008, 56(8): 1994-2009. [14] FENG J, JIAO L C, LIU F, et al. Unsupervised Feature Selection Based on Maximum Information and Minimum Redundancy for Hyperspectral Images. Pattern Recognition, 2016, 51: 295-309. [15] HEDJAZI L, AGUILAR-MARTIN J, LE LANN M V, et al. Similarity-Margin Based Feature Selection for Symbolic Interval Data. Pattern Recognition Letters, 2011, 32(4): 578-585. [16] BILLARD L, DOUZAL-CHOUAKRIA A, DIDAY E. Symbolic Principal Component for Interval-Valued Observations[C/OL]. [2017-04-25]. https://hal.archives-ouvertes.fr/file/index/docid/361053/filename/DouzalPCA.pdf. [17] QUEVEDO J, PUIG V, CEMBRANO G, et al. Validation and Reconstruction of Flow Meter Data in the Barcelona Water Distribution Network. Control Engineering Practice, 2010, 18(6): 640-651. [18] ZHANG Y Y, LI T R, LUO C, et al. Incremental Updating of Rough Approximations in Interval-Valued Information Systems under Attribute Generalization. Information Sciences, 2016, 373: 461-475. [19] DAI J H, WANG W T, MI J S. Uncertainty Measurement for Interval-Valued Information Systems. Information Sciences, 2013, 251: 63-78. [20] DE CARVALHO F D A T. Fuzzy C-means Clustering Methods for Symbolic Interval Data. Pattern Recognition Letters, 2007, 28(4): 423-437. [21] DE CARVALHO F D A T, LECHEVALLIER Y. Partitional Clustering Algorithms for Symbolic Interval Data Based on Single Adaptive Distances. Pattern Recognition, 2009, 42(7): 1223-1236. |
|
|
|